Checkpoints:
Process View
- Potential race conditions (process competition for critical
resources) have been identified and avoidance and resolution strategies
have been defined.
- There is a defined strategy for handling "I/O queue
full" or "buffer full" conditions.
- The system monitors itself (capacity threshold, critical
performance threshold, resource exhaustion) and is capable of taking
corrective action when a problem is detected.
- Response time requirements for each message have been
identified.
- There is a diagnostic mode for the system which allows
message response times to be measured.
- The nominal and maximal performance requirements for
important operations have been specified.
- There are a set of performance tests capable of measuring
whether performance requirements have been met.
- The performance tests cover the "extra-normal"
behavior of the system (startup and shutdown, alternate and exceptional
flows of events of the use cases, system failure modes).
- Architectural weaknesses creating the potential for
performance bottlenecks have been identified. Particular emphasis has been
given to:
- Use of some finite shared resource such as (but not limited
to) semaphores, file handles, locks, latches, shared memory, etc.
- inter-process communication. Communication across process
boundaries is always more expensive than in-process communication.
- inter-processor communication. Communication across process
boundaries is always more expensive than inter-process communication.
- physical and virtual memory usage; the point at which the
system runs out of physical memory and starts using virtual memory is a
point at which performance usually drops precipitously.
- Where there are primary and backup processes, the potential
for more than one process believing that it is primary (or no process
believing that it is primary) has been considered and specific design
actions have been taken to resolve the conflict.
- There are external processes that will restore the system to
a consistent state when an event like a process failure leaves the system
in an inconsistent state.
- The system tolerant of errors and exceptions, such that when
an error or exception occurs, the system can revert to a consistent state.
- Diagnostic tests can be executed while the system is running.
- The system can be upgraded (hardware, software) while it is
running, if required.
- There is a consistent policy for handling alarms in the
system, and the policy has been consistently applied. The alarm policy
addresses:
- the "sensitivity" of the alarm reporting
mechanism;
- the prevention of false or redundant alarms;
- the training and user interface requirements of staff who
will use the alarm reporting mechanism.
- The performance impact (process cycles, memory, etc.) of the
alarm reporting mechanism has been assessed and falls within acceptable
performance thresholds as established in the performance requirements.
- The workload/performance requirements have been examined and
have been satisfied. In the case where the performance requirements are
unrealistic, they have been re-negotiated.
- Memory budgets, to the extent that they exist, have been
identified and the software has been verified to meet those requirements.
Measures have been taken to detect and prevent memory leaks.
- A policy exists for use of the virtual memory system,
including how to monitor and tune its usage.
- Processes are sufficiently independent of one another that
they can be distributed across processors or nodes when required.
- Processes which must remain co-located (because of
performance and throughput requirements, or the inter-process
communication mechanism (e.g. semaphores or shared memory)) have been
identified, and the impact of not being able to distribute this workload
has been taken into consideration.
- Messages which can be made asynchronous, so that they can be
processed when resources are more available, have been identified.
Copyright
⌐ 1987 - 2000 Rational Software Corporation
| |

|